Skip to content

performance-unnecessary-value-param#8305

Open
Jacobfaib wants to merge 5 commits intoNVIDIA:mainfrom
Jacobfaib:jacobf/2026-04-07/performance-unnecessary-value-param
Open

performance-unnecessary-value-param#8305
Jacobfaib wants to merge 5 commits intoNVIDIA:mainfrom
Jacobfaib:jacobf/2026-04-07/performance-unnecessary-value-param

Conversation

@Jacobfaib
Copy link
Copy Markdown
Contributor

@Jacobfaib Jacobfaib commented Apr 7, 2026

Description

To silence this check you can either:

  • Make arguments a const-ref
  • Move or forward the argument later on
  • Silence the check

My policy for fixing this was as follows:

  • If the API is usually used on device (or is device only), silence the check. Promoting by-value arguments to references might lead to missed optimizations from nvcc on device as it may not be able to prove that pointers are restricted. Also overwhelming majority of on-device data structures are PODs so move makes no difference for perf (not to mention correctness issues if e.g. init values come from inside a container to be modified).
  • If the API is host-only, or majority host, move or forward the arguments.
  • If moving is not possible or "hard" (big function, easy to inadvertently use after move), silence the diagnostic. In extremely rare cases (test code), where I could prove that it was host-only, I made it a const-ref.

The upshot is that this was almost never fixed via const-ref. The vast majority of violations in CUB are silenced, and a lot of them are fixed via moves in thrust.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Jacobfaib Jacobfaib self-assigned this Apr 7, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 7, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 7, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Apr 7, 2026
@Jacobfaib Jacobfaib force-pushed the jacobf/2026-04-07/performance-unnecessary-value-param branch from 849453d to b53da5d Compare April 7, 2026 15:23
@Jacobfaib Jacobfaib marked this pull request as ready for review April 7, 2026 15:29
@Jacobfaib Jacobfaib requested review from a team as code owners April 7, 2026 15:29
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Apr 7, 2026
Copy link
Copy Markdown
Contributor

@fbusato fbusato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it is worth for CUB. The PR applies NOLINT everywhere and use move which just adds verbosity

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Jacobfaib Jacobfaib mentioned this pull request Apr 8, 2026
2 tasks
@Jacobfaib Jacobfaib force-pushed the jacobf/2026-04-07/performance-unnecessary-value-param branch from c6bb09e to 9ccde38 Compare April 8, 2026 18:18
@github-actions

This comment has been minimized.

@Jacobfaib Jacobfaib force-pushed the jacobf/2026-04-07/performance-unnecessary-value-param branch from 9ccde38 to 580cbd9 Compare April 8, 2026 22:07
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Collaborator

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be very cautious about applying this too widely on device code functions. I've routinely seen passing arguments by const& instead of by value lead to increased local memory usage because ODR using the value by taking a reference can lead to the compiler spilling registers to local memory. Given the number of places we've had to manually disable this check, my intuition is it'd be better to just omit this check altogether.

@miscco
Copy link
Copy Markdown
Contributor

miscco commented Apr 9, 2026

I believe this PR is too big to properly review.

We should split this up into smaller parts per subproject. Also I second @jrhemstad that using references is sometimes a loss, especially for types that are cheap to copy

@Jacobfaib
Copy link
Copy Markdown
Contributor Author

@jrhemstad @miscco see the PR description, I took great care to not use references anywhere that would reasonably be used on device more often than not (e.g. CUB, or the thrust kernels). These are all silenced.

But yeah, given how much of these diffs are just // NOLINT, it seems like we can probably get away with disabling the check.

@Jacobfaib Jacobfaib force-pushed the jacobf/2026-04-07/performance-unnecessary-value-param branch 3 times, most recently from efe38e6 to 9b5f22f Compare April 9, 2026 11:30
@Jacobfaib Jacobfaib requested review from fbusato and jrhemstad April 9, 2026 11:30
@github-actions

This comment has been minimized.

@Jacobfaib Jacobfaib force-pushed the jacobf/2026-04-07/performance-unnecessary-value-param branch from c97e1c9 to 7509fe4 Compare April 9, 2026 19:22
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

😬 CI Workflow Results

🟥 Finished in 3h 19m: Pass: 99%/428 | Total: 14d 16h | Max: 3h 08m | Hits: 73%/555163

See results here.

@wmaxey
Copy link
Copy Markdown
Member

wmaxey commented Apr 10, 2026

I believe this PR is too big to properly review.

We should split this up into smaller parts per subproject. Also I second @jrhemstad that using references is sometimes a loss, especially for types that are cheap to copy

I think this is mostly concisely targeting Thrust. I like the idea from a correctness point of view, but understand the difficulty with actually narrowing the check to just host API. I won't immediately say no to the bulk of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

5 participants